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Department  of  Defense  (DOD)  software  production  and 
maintenance  is  a large,  poorly  understood,  and  inefficient 
process.  Recently  Frost  and  Sullivan  (The  Military  Software 
Market,  1977)  estimated  the  yearly  cost  for  software  within 
DOD  to  be  as  large  as  $9  billion.  DeRoze  (1977)  has  also 
estimated  that  115  major  defense  systems  depend  on  software 
for  their  success.  In  an  effort  to  find  near-term  solutions 
to  software  related  problems,  the  DOD  has  begun  to  support 
research  into  the  software  production  process. 

A formal  5 year  R&D  plan  (Carlson  & DeRoze,  1977) 
related  to  the  management  and  control  of  computer  resources 
was  recently  written  in  response  to  DOD  Directive  5000.29. 
This  plan  requested  research  leading  to  the  identification 
and  validation  of  metrics  for  software  quality.  The  study 
described  in  this  paper  represents  an  experimental 
investigation  of  such  metrics,  and  is  part  of  a larger 
research  program  seeking  to  provide  valuable  information 
about  the  psychological  and  human  resource  aspects  of  the  5 
year  plan. 

DOD  is  also  initiating  the  development  of  a more 
powerful  higher  order  language  for  general  use  by  all 
services  (Department  of  Defense,  1977).  With  a language 
independent  measure  of  the  complexity  of  software,  we  can 
evaluate  not  only  program  A versus  program  B,  but  also  the 
individual  constructs  of  a language  (cf.  Gordon,  1977). 
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Thus,  an  objective,  quantitative  theory  based  on  sound 
experimental  data  can  replace  idiosyncratic,  subjective 
evaluations  of  the  psychololgical  complexity  of  software. 
Long  term  benefits  of  this  effort  involve  improved  software 
system  reliabilty  and  reduced  development  and  maintenance 
costs. 

The  challenge  undertaken  in  this  research  program  is  to 
quantify  the  psychological  complexity  of  software.  It  is 
important  to  distinguish  clearly  between  the  psychological 
and  computational  complexity  of  software.  Computational 
complexity  refers  to  characteristics  of  algorithms  or 
programs  which  make  their  proof  of  correctness  difficult, 
lengthy,  or  impossible.  For  example,  as  the  number  of 
distinct  paths  through  a program  increases,  the 
computational  complexity  also  increases.  Psychological 
complexity  refers  to  those  characteristics  of  software  which 
make  human  understanding  of  software  more  difficult.  No 
direct  linear  relationship  between  computational  and 
psychological  complexity  is  expected.  A program  with  many 
control  paths  may  not  be  psychologically  complex.  Any 
regularity  to  the  branching  process  within  a program  may  be 
used  by  a programmer  to  simplify  understanding  of  the 
pr ogram. 

Halstead  (1977)  has  recently  developed  a theory 
concerned  with  the  psychological  aspects  of  computer 
programming.  His  theory  provides  objective  estimates  of  the 
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I effort  and  time  required  to  generate  a program,  the  effort 

required  to  understand  a program,  and  the  number  of  bugs  in 

I a particular  program  (Fitzsimmons  & Love,  in  press).  Some 

I predictions  of  the  theory  are  counterintuitive  and 

I I 

I contradict  some  results  of  previous  psychological  research. 

I The  theory  has  attracted  attention  because  independent  tests 

of  hypotheses  derived  from  it  have  proven  amazingly 

• accurate. 

I Although  predictions  of  programmer  behavior  have  been 

particularly  impressive,  much  of  the  research  testing 

I Halstead's  theory  has.  been  performed  without  sufficient 

experimental  or  statistical  controls.  Further,  much  of  the 

* data  were  based  upon  imprecise  estimating  techniques. 

I Nevertheless,  the  available  evidence  has  been  sufficient  to 

justify  a rigorous  evaluation  of  the  theory. 

I Rather  than  initiate  a research  program  designed 

. specifically  to  test  the  theory  of  software  science,  a 

■ research  strategy  was  chosen  which  would  generate 

I suggestions  for  improving  programmer  efficiency  regardless 

of  the  success  of  any  particular  theory.  This  research 
I focuses  on  four  phases  of  the  software  life-cycle; 

I understanding,  modification,  debugging,  and  construction. 

Since  different  cognitive  processes  are  assumed  to 
I predominate  in  each  phase,  no  single  experiment  or  set  of 

experiments  on  a particular  phase  would  provide  sufficient 

■ basis  for  making  broad  recommendations  for  improving 
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programmer  efficiency.  Each  experiment  in  the  series 
comprising  this  research  program  has  been  designed  to  test 
important  variables  assumed  to  affect  a particular  phase  of 
software  development.  Professional  programmers  will  be  used 
in  these  experiments  to  provide  the  greatest  possible 
external  validity  for  the  results  (Campbell  & Stanley, 
1966).  In  addition,  Halstead's  theory  of  software  science 
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ABSTRACT 

This  report  describes  the  first  experiment  in  a program 
of  research  designed  to  identify  characteristics  of  computer 
software  which  are  related  to  its  psychololgical  complexity. 
This  experiment  evaluated  the  effect  of  three  independent 
variables  (mnemonic  variable  names,  level  of  program 
structure,  and  general  type  of  program)  on  a programmer's 
understanding  of  a computer  program.  The  contributions  of 
several  variables  to  the  prediction  of  program  understanding 
were  also  evaluated.  Significant  results  were  achieved  in  a 
pilot  study  by  Sheppard  and  Love  (1977)  using  the  materials 
and  procedures  employed  here. 

Thirty-six  experienced  programmers  were  instructed  to 
study  a computer  program  for  20  minutes,  and  were  then  given 
25  minutes  to  reconstruct  a functionally  equivalent 
program.  Performance  was  measured  by  the  percentage  of 
functionally  correct  statements  recalled.  Results  indicated 
that  level  of  program  structure  and  program  class  affected 
program  understanding,  while  no  relationship  was  found  for 


mnemonic 

variable  names. 

The 

metrics  of  both 

Halstead 

and 

McCabe 

were  related 

to 

program  understanding 

when 

differences  between  subjects 

and  specific 

programs 

were 

taken  into  consideration. 


\ 

1' 


d 


vii 


I 

I 

1 

I 

I 

1 

1 


1 


1 


i. 


Predicting  Software  Comprehensibility 
1.0  INTRODUCTION 

Programmers'  ability  to  understand  computer  programs 
may  have  substantial  impact  on  their  efficiency  in  debugging 
or  modifying  these  programs.  There  are  several  software 
engineering  practices  which  have  been  designed  to  increase 
programmers'  efficiency  in  terms  of  both  the  accuracy  and 
speed  of  their  work.  Programs  developed  in  accordance  with 
these  practices  should  be  more  easily  understood. 

Dijkstra  (1972)  suggested  that  program  construction 
should  proceed  in  a top-down  structured  fashion.  He 
contended  that  structured  programs  are  easier  to  understand, 
debug,  and  modify.  In  a study  using  student  programmers  and 
text  book  programs.  Love  (1977)  found  that  simplified 
control  flow  made  programs  easier  to  understand  for  graduate 
(but  not  for  introductory)  students.  That  study  did  not  use 
programs  which  were  strictly  structured. 

Another  standard  software  engineering  practice  is  the 
use  of  carefully  chosen  variable  names  which  serve  as 
mnemonic  aids  in  understanding  programs.  Weissman's  (1974) 
research  suggested  that  menmonic  variable  names  resulted  in 
performance  increases  (up  to  a factor  of  2) . His  results 
need  replication  since  there  were  difficulties  with  his 
experimental  design  and  dependent  measures. 

! In  parallel  with  these  attempts  to  improve  programmer 
efficiency,  several  approaches  were  developed  for  predicting 
the  psychological  complexity  of  software  algorithms.  In 
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1972,  Halstead  first  published  his  software  physics  theory 
(later  renamed  software  science)  stating  that  algorithms 
have  measurable  characteristics  analogous  to  physical  laws. 
His  objective  was  to  develop  quantitative  measures  of  the 
complexity  of  computer  programs  in  terms  of  language  level, 
algorithm  purity,  programming  effort,  and  programming  time. 
Preliminary  tests  of  the  theory  have  shown  very  high 
correlations  (some  greater  than  .90)  between  his  software 
physics  metrics  and  such  dependent  measures  as  the  number  of 
bugs  in  programs  (Punami  & Halstead,  1975),  programming  time 
(Gordon  & Halstead,  1975),  and  quality  of  programs 
(Halstead,  1973). 

There  have  been  several  recent  attempts  to  develop 
metrics  for  the  complexity  of  control  flow  through  a 
computer  program  (e.g..  Bell  & Sullivan,  1974).  One  of  the 
most  promising  of  these  metrics  was  proposed  by  McCabe 
(1976).  McCabe's  metric  will  be  used  in  this  study  as  an 
alternative  against  which  Halstead's  metrics  can  be 
compared . 

A critical  issue  in  assessing  the  utility  of  these 
software  .engineering  practices  and  metrics  involves  the 
definition  of  a dependent  variable.  A model  of  a 
programmer's  understanding  of  a computer  program  is  shown  in 
Figure  1.  First,  a programmer  must  understand  the  over’all 
purpose  of  a program.  Then  an  interactive  process  begins  in 
which  successive  modules  must  be  understood  separately,  and 
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then  integrated  into  the  overall  flow  of  the  program. 
Measures  of  understanding  such  as  quiz  scores  or  ability  to 
hand  simulate  a program  may  have  reflected  existing 

knowledge  of  programming  methods  and  techniques  rather  than 
specific  knowledge  of  the  particular  program  under 
consideration. 

Current  literature  (Love.  1977;  Shneiderman,  1974, 
1977)  suggests  that  the  most  sensitive  measure  of  whether 
people  understand  a computer  program  is  their  ability  to 
learn  a program's  structure  and  reproduce  a functionally 
equivalent  program  without  notes.  It  would  be  extremely 
difficult  to  reproduce  a non-trivial  program  without  some 
understanding  of  its  function.  The  dependent  measure 
employed  here  was  the  functional  correctness  of  a 
participant's  reconstructed  program. 

The  main  purpose  of  this  experiment  was  to  ascertain 
the  relationship  between  two  programming  style  variables  and 
the  ability  to  understand  a program.  There  was  also  an 
assessment  of  whether  comprehensibility  differed  as  a 
function  of  program  type.  In  addition,  the  relationship 
between  comprehensibility  and  three  program  metrics  (i.e., 
Halstead's  E,  McCabe's  V(G) , and  the  number  of  statements) 
was  evaluated. 
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2.0  METHOD 

2.1  Participants 

Thirty-six  professional  programmers  were  tested  in  five 
different  locations.  Each  participant  was  a General 
Electric  employee  with  working  knowledge  of  FORTRAN.  These 
programmers  had  an  average  of  6.8  years  of  professional 
programming  experience  (ranging  from  0 to  18  years) . The 
majority  (Jl=23)  came  from  an  engineering  background,  two 
were  statistical  programmers,  and  nine  had  been  primarily 
involved  with  non-numeric  or  text  processing  software. 

2.2  Procedure 

A packet  of  materials  was  prepared  for  each 
participant.  The  initial  instructions  to  each  participant 
are  presented  in  Appendix  6.1.  The  written  instructions 
included  questions  on  the  extent  of  programming  experience 
and  area  of  expertise.  The  first  exercise  was  a short 
FORTRAN  program  with  a brief  description  of  its  purpose 
All  36  participants  received  the  same  program,  which  they 
were  allowed  to  study  for  ten  minutes.  They  were  permitted 
to  make  notes  or  draw  flowcharts.  At  the  end  of  the  study 
period,  the  original  program  and  all  scrap  papers  were 
collected.  Each  participant  was  then  given  five  minutes  to 
reconstruct  a functional  eouivalent  of  the  program  from 
memory  on  a blank  sheet  of  paper,  but  was  not  required  to 
reproduce  the  comment  section. 

The  purposes  of  this  short  introductory  program  were 
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t 1)  to  provide  a cominon  basis  for  comparing  the  skills 

, of  the  participants  on  this  type  of  task,  and 

* 2)  to  control  for  initial  learning  effects. 

J This  second  point  is  important  since  a previous  study 

(Sheppard  and  Love,  1977)  indicated  that  learning  may  occur 
I during  an  initial  task  of  this  type. 

Following  this  initial  exercise,  participants  were 
' presented  in  turn  with  three  programs  which  comprised  the 

I experimental  task  for  this  study.  They  were  allowed  25 

minutes  for  study  and  20  minutes  for  the  reconstruction  . of 
I each  program.  A break  of  15  minutes  occurred  before  the 

. last  program  was  presented. 

2.3  Experimental  Design 

I In  order  to  control  for  individual  differences  in 

performance,  a within-sub  jects  3**  fractional  factorial 
1 design  was  employed  in  this  experiment  (Hahn  & Shaprio, 

f 1966} . Nine  programs  of  three  general  classes  were  tested 

(Table  1) . Three  levels  of  program  structure  were  defined 
I for  each  of  the  nine  programs,  and  each  of  these  27  versions 

was  presented  in  three  levels  of  variable  mnemonicity  for  a 
I total  of  81  programs.  The  programs  were  selected  from  a set 

I of  programs  solicited  from  practicing  programmers  at  several 

GE  locations. 

I Four  sets  of  nine  participants  were  used  in  the 

experiment.  The  27  participants  in  the  first  three  sets 
' exhausted  the  total  of  81  programs.  The  fourth  set  of  9 
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EXPERIMENTAL  DESIGN 


Note:  Each  number  represents  the  assignment  of  one  participant  within  the  block  of 
27  participants  (3  sets  of  9 participants  each). 
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participants  repeated  one  of  the  three  previous  sets.  Table 
1 shows  the  design  for  the  first  27  participants. 

Programmers  at  each  locati'^n  were  randomly  assigned  to 
experimental  conditions  in  order  that  over  the  course  of 
their  three  experimental  programs,  every  participant  had 
worked  with  a program  from  each  class,  and  at  each  level  of 
structure  and  variable  mnemonicity.  For  example, 

participant  4 received  the  following  three  programs:  1) 
BESSEL  - an  engineering  progreun,  unstructured  control  flow, 
medium  mnemonic  level,  2)  CHISQ  - a statistical  program, 
quasi-structured  control  flow,  least  mnemonic  level,  and  3) 
SELECT  - a non-numeric  program,  structured  control  flow, 
most  mnemonic  level.  For  simplicity  the  design  is  presented 
in  Table  1 without  regard  for  the  order  of  presentation  to 
the  participants.  One  of  the  six  possible  orders  of 
presentation  of  three  programs  was  assigned  randomly  and 
without  replacement  to  each  participant. 

2.4  Independent  Variables 
2.4.1  Program  Class 

Three  general  classes  of  programs  were  used: 
engineering,  statistical,  and  non-numeric.  Three  programs 
of  each  class  were  developed.  Appendix  6.2  describes  the 
purpose  of  the  nine  programs  and  shows  their  lengths,  which 
varied  from  36  to  57  statements.  The  programs  selected  were 
considered  to  be  representative  of  the  type  of  programs 
actually  encountered  by  professional  programmers  in  each  of 
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these  areas.  Appendix  6.3  has  some  sample  program  listings. 

All  nine  programs  used  in  this  experiment  were  compiled  and 
executed  using  appropriate  test  data. 

4.2  Progreim  Structure 

Three  levels  of  prograuti  structure  were  defined.  The 

structured  level  adhered  strictly  to  the  tenets  of 
structured  programming  (Dijkstra,  1972).  Program  flow 
proceeded  from  top  to  bottom  with  one  entry  and  one  exit. 
Neither  backward  transfer  of  control  nor  arithmetic  IP's 
were  allowed. 

Awkward  constructions  may  occur  in  FORTRAN,  when  the 
rules  for  structured  programming  are  applied  rigorously. 
These  include  necessary  but  artificial  GO  TO's  and  DO  loops 
with  dummy  variables  (Tenny,  1974) . These  awkward 
constructions  were  largely  eliminated  in  the 
quasi-structured  level,  where  a more  natural  control  flow 
was  allowed.  A judicious  use  of  backward  GO  TO  statements 
and  multiple  exits  was  permitted.  IF  statements  were  again 
restricted  to  assignment  and  logical  IP's. 

In  the  unstructured  version  of  each  program  the  control 
flow  was  not  straightforward.  The  GO  TO  statement  occurred 
more  frequently  and  backward  transfer  of  control  was  not 
restricted.  The  three-way  transfer  of  control  statement 
(arithmetic  IF)  was  allowed  only  at  this  level  (Table  2) . 


Two  well-known  metrics  for  program  complexity  were  also 
calculated  for  each  of  the  program^in  order  to  determine 
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TABLE  2 

CONTROL  STRUCTURES  ALLOWED 
IN  THE 

THREE  LEVELS  OF  COMPLEXITY 
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their  correspondence  with  the  working ^ definitions.  The 
metrics  selected  were:  McCabe's  V(G)  (McCabe,  1976)  and 
Halstead's  E (Halstead,  1977). 

2. 4. 2.1  Halstead's  E 

In  Halstead's  theory  of  software  science,  the  amount  of 
effort  required  to  generate  a program  (E) , can  be  calculated 
from  simple  counts  of  the  actual  code.  The  calculations  are 
based  on  four  quantities:  1)  the  number  of  distinct 
operators  and  operands,  and  2)  the  number  of  occurrences  of 
operators  and  operands.  From  these  relationships,  Halstead 
derives  the  number  of  mental  comparisons  required  to 
generate  a program. 

Since  different  programming  languages  produce  varying 
numbers  of  instructions,  the  number  of  elementary  mental 
discriminations  for  each  mental  comparison  varies  with  the 
language  used.  When  a correction  is  made  to  account  for 
these  differences,  one  can  define  E in  terms  of  the  number 
of  mental  discriminations  per  program;  i.e.,  the  number  of 
comparisons  in  the  program  multiplied  by  the  average  number 
of  mental  discriminations  made  per  comparison.  A discussion 
of  the  computational  formula  can  be  found  in  Fitzsimmons  and 
Love  (in  press)  or  Halstead  (1977) . 

All  software  science  metrics  were  computed  precisely 
from  a program  (based  on  Ottenstein,  1975)  which  had  as 
input  the  source  code  listings  of  27  programs  (9  separate 
programs  at  each  of  three  levels  of  structure) . 
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2. 4. 2. 2 McCabe's  V(G) 


McCabe's  metric  is  the  classical  graph-theory 
cyclometric  number,  defined  as  V(G)  = # edges  - # nodes  + # 
connected  regions.  Because  the  McCabe  measure  is  defined 
only  for  programs  that  adhere  strictly  to  the  rules  of 
structured  programming,  some  modifications  to  the  metric 
were  necessary  in  order  to  evaluate  the  less  structured 
control  flow  versions.  (See  Appendix  6.4  for  a description 
of  these  modifications).  All  experimental  programs  were 
checked  before  the  experiment  to  insure  that  the  most 
complex  version  of  program  had  the  highest  McCabe  value  and 
the  least  complex  version  had  the  lowest  value. 

2.4.3  Variable  Name  Mnemonicity 

Three  levels  of  mnemonicity  for  variable  names  were 
manipulated  independently  of  program  structure  levels. 
Because  meaningfulness  is  difficult  to  assign  arbitrarily,  a 
preliminary  assessment  was  done.  The  nine  programs  were 

modified  so  that  the  variable  names  were  VI,  V2 , ...,  VN. 
Professional  programmers  were  presented  the  programs  and 
descriptions  of  their  purpose.  They  were  asked  to 

substitute  meaningful  names  for  the  'V  names.  The  names 
most  often  generated  were  used  in  the  most  mnemonic 
condition.  The  moderately  mnemonic  level  consisted  of  less 
frequently  chosen  names.  In  the  least  mnemonic  condition 
neunes  consisted  of  a randomly  chosen  letter  such  that  all 
real  variables  began  with  A through  H or  0 through  Z,  and 
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all  integer  variables  began  with  I through  N.  In  the  few 
cases  where  there  were  more  than  six  integer  names,  the 
letter  was  followed  by  a single  digit  (e.g.,  12).  Counters 
in  DO  loops  often  had  idential  names  in  all  three  mnemonic 
versions  (See  Appendix  6.5),  since  long  mnemonic  variable 
names  are  rarely  used  as  counters  in  programs. 

2.5  Covariates 

In  order  to  obtain  a measure  which  was  assumed  to  be 
related  to  programming  ability,  all  participants  were 
reauired  to  perform  the  same  preliminary  task.  A short 

program  was  given  to  each  participant  to  study  and  then 
reconstruct.  Their  scores  on  this  task  were  used  as  a 
covariate  to  measure  individual  performance  differences. 
Participants  were  also  asked  their  type  of  programming 
experience  and  the  number  of  years  they  had  been  programming 
professionally.  Situational  covariates  included  the 

sequence  of  presentation  and  the  specific  program. 

2.6  Dependent  Variable 

All  warm-up  programs  were  scored  by  the  same  grader. 
The  remaining  108  experimental  programs  were  scored 
independently  by  three  graders.  The  criterion  for  scoring 
the  programs  was  the  functional  correctness  of  each 
separately  reconstructed  statement.  Variable  names  and 
statement  numbers  which  differed  from  those  in  the  original 
program  were  counted  as  correct  when  used  consistently.  All 
errors  were  classified  as  either  syntactical  or  logical. 
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[ I Only  one  error  of  each  type  was  counted  per  statement,  even 

I i ' 

^ j though  multiple  syntactical  and  logical  errors  could  occur 

I I in  the  same  statement.  Control  structures  could  be 

i ■ 

‘ I different  from  the  original  program  as  long  as  the  statement 
performed  the  same  function. 

I Because  it  is  difficult  to  prove  the  eguivalence  of  two 

versions  of  the  same  program,  function,  or  statement,  three 
judges  scored  each  program  independently.  Interjudge 
correlations  of  .96,  .96,  and  .94  were  obtained  across  the 
three  sets  of  scores.  The  average  of  the  three  scores 
(percents  of  statements  correctly  reconstructed)  for  each 
program  was  used  as  the  dependent  variable  in  the  data 
analysis. 

2.7  Analysis 

The  analysis  of  results  was  conducted  in  two  phases. 
The  first  phase  was  an  experimental  test  of  the  programming 
style  variables,  while  the  second  phase  was  an  evaluation  of 
the  software  complexity  metrics. 

The  first  phase,  involving  an  experimental  test  of 
programming  practices,  was  analyzed  in  a hierarchical 
regression  analysis.  In  this  analysis,  domains  of  variables 
were  entered  sequentially  into  a multiple  regression 
equation  to  determine  if  each  successive  domain 
significantly  improved  the  prediction  of  the  equation 
developed  from  domains  already  entered.  Thus,  the  order 
with  which  domains  were  entered  into  the  analysis  was 


T 


Predicting  Software  Comprehensibility 


important.  In  this  study  effects  related  to  pre-existing 
differences  among  participants  and  programs  were  entered 
into  the  analysis  prior  to  evaluating  the  effects  of 
programming  styles.  The  variable  domains  were  entered  in 
the  following  order: 

Differences  related  to  participants  and  programs 

1)  Pretest  scores 

2)  Class  of  program 

3}  Specific  program 

Programming  styles 

4)  Program  structure 

5)  Variable  mnemonicity 

6)  The  interaction  between  program  structure  and 
variable  mnemonicity. 

The  variables  representing  the  different  conditions  of 
domains  2 through  5 were  effect  coded  (Kerlinger  & Pedhazur, 
1973). 

The  second  phase  of  analysis  investigated  relationships 
among  Halstead's  E,  McCabe's  V(G) , number  of  statements  in 
the  program,  and  performance.  Analysis  consisted  of 
examining  correlations  among  the  measures  in  both  the  raw 
data  and  ' data  corrected  for  differences  among  participants 
and  programs. 
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3.1  Individual  Differences  among  Participants 

Data  presented  in  Table  3 indicate  that  scores  on  the 
pretest  were  significantly  related  to  the  percent  of 

statements  correctly  reconstructed  during  the  experiment. 
Pretest  scores  accounted  for  17%  of  the  variance  in 

performance,  while  no  relationships  were  observed  for  type 
or  length  of  programming  experience.  The  two  statistical 
programmers  recalled  more  statements  correctly  than 
engineering  or  non-numeric  programmers,  but  generalization 
is  not  possible  from  such  a limited  sample.  Further,  job 
location  was  not  related  to  performance. 

3.2  Differenced  among  Programs 

A mean  of  50%  of  the  statements  were  correctly  recalled 
across  all  programs  and  experimental  conditions.  While  this 
was  a preferred  level  for  mean  task  difficulty,  there  were 
substantial  differences  in  difficulty  among  the  various 
programs.  As  evident  in  Table  3,  performance  differed 
significantly  as  a function  of  the  class  of  program.  These 
differences  accounted  for  8%  of  the  variance  in  performance 
in  addition  to  that  accounted  for  by  individual  differences 
among  participants.  Engineering  programs  were  the  most 
difficult  (41%  of  the  statements  correctly  recalled) , 
followed  by  statistical  (52%)  and  non-numeric  (57%)  . 

When  the  specific  program  was  taken  into  account  an 
additional  20%  of  the  variance  in  performance  was 
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TABLE  3 

WERARCfcllAL  REGRESSION  ANALYSIS 


VARIABLE  DOMAIN 

DOMAIN® 

R^ 

df 

2b 

ar 

1) 

PRETEST 

.17** 

1 

.17** 

2) 

CLASS  OF  PROGRAM 

.09** 

2 

.08** 

3) 

SPECIFIC  PROGRAM 

.26** 

8 

.2U** 

4) 

PROGRAM  STRUCTURE  (PS) 

.07** 

2 

.07** 

5) 

VARIABLE  MNEMONICITY  (VM) 

.01 

2 

.01 

6) 

PS  X VM 

.03 

4 

.03 

TOTAL 

19 

.56 

Note:  = 108 

a Correlations  in  this  column  represent  the  total  relationship  between 
each  variable  domain  and  performance.  Where  there  is  only  one  degree 
of  freedom  for  a particular  domain,  figures  in  this  column  represent 
zero-order  correlations,  otherwise  they  represent  multiple  correla- 
tions for  all  variables  in  the  domain. 

b Figures  in  this  column  indicate  the  percent  of  variance  contributed 
to  prediction  of  performance  in  addition  to  that  afforded  by  preceding 
domains.  Significance  levels  indicate  whether  this  represented  a 
significant  contribution  to  prediction. 
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explained .However , this  result  is  not  strictly  a function  of 
differences  among  programs,  because  variance  related  to 
specific  programs  was  confounded  with  variance  related  to 
participants.  Overall,  45%  of  the  variance  in  perform^ce 
was  accounted  for  by  differences  among  participants  and 
general  program  characteristics. 

3.3  Program  Structure 

Significant  differences  in  performance  were  obtained  as 
a function  of  program  structure.  The  three  levels  of 
structure  accounted  for  7%  of  the  variance  in  performance  in 
addition  to  variance  related  to  differences  among  programs 
and  participants.  As  expected,  the  least  structured  level 
was  the  most  difficult  to  reconstruct  (Figure  2) . Contrary 
to  the  tenets  of  structured  programming,  however,  the  most 
structured  level  did  not  produce  the  best  performance.  A 
greater  percent  of  statements  were  recalled  from 
quasi-structured  programs,  conceivably  because  of  their  less 
cumbersome  constructs.  A post  hoc  analysis  (Scheffe,  1959) 
showed  the  means  for  the  quasi-  and  unstructured  programs  to 
be  significantly  different  (p£.05). 

3.4  Variable  Name  Nnemonicity 

No  significant  differences  in  performance  were  observed 
in  relation  to  the  three  levels  of  mnemnonicity  assigned  to 
variable  names.  Consequently,  variable  mnemonicity  did  not 
contribute  significantly  to  the  hierarchical  regression 
equation.  Further,  no  significant  interaction  was  found 
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FIGURE  2:  Mean  percent  of  statements  correctly  recalled  for  three 
levels  of  program  structure 
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between  variable  mneraonicity  and  level  of  structure. 

3.5  Order  of  Presentation 

Performance  did  not  differ  as  a function  of  the  order 
in  which  the  programs  were  presented  to  participants, 
suggesting  that  any  learning  process  which  might  have 
affected  the  results  occurred  during  the  pretest  rather  than 
during  the  three  experimental  tasks. 

Since  different  levels  of  variable  mnemonicity  neither 
affected  performance  significantly,  nor  caused  any  change  in 
complexity  metrics  for  a particular  program,  the  data 
reported  in  this  section  were  averaged  over  levels  of. 
mnemonicity.  Thus,  the  27  data  points  each  represent  a 
value  for  a specfic  program  at  a specific  level  of 
structure.  This  averaging  process  also  reduced  to  some 
extent  the  effect  of  individual  differences  among 
participants  since  each  data  point  is  averaged  across  either 
3 or  6 participants  (9  of  the  conditions  were  repeated  by  an 
additional  three  participants) . 

3.6.1  Relationships  among  Metrics 

Table  4 presents  correlations  among  the  three  metrics 
of  software  complexity  employed  in  this  study;  namely, 
length  (number  of  statements),  McCabe's  V(G) , and  Halstead's 
E.  Length  and  McCabe's  V(G)  were  strongly  correlated,  while 
Halstead's  E displayed  only  moderate  relationships  with 
these  other  two  metrics.  Investigation  of  the  scatterplots 
(Appendix  6.6)  indicated  that  the  correlations  for 
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TABLE  4 

Correlations  among  Metrics  of  Software 

Comp! exi ty 

CORRELATIONS 

i 

METRIC 

LENGTH  MCCABE 

■ 

MCCABE 

1 

,75** 

HALSTEAD 

.47**  .42* 

HALSTEAD  (n=26l 

.75**  .84** 

NOTE:  Except  where  indicated,  £ = 27. 

*p  S .05 
**p  a .01 
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Halstead's  E were  weakened  by  an  extreme  value  for  E of 
250. With  this  outlier  removed  the  correlations  of  E with  the 
other  two  metrics  rose  to  the  same  level  observed  in  their 
in ter cor relation. 

3.6.2  Relationships  of  Metrics  with  Performance 

The  percents  of  variance  in  performance  accounted  for 
by  each  complexity  metric  are  presented  in  Table  5.  The 
correlations  underlying  the  data  reported  in  this  section 
were  all  negative  indicating  that  performance  fell  as  the 
level  of  complexity  indexed  by  these  three  metrics 
increased.  Length  and  McCabe's  V(G}  were  moderately  related 
to  performance,  accounting  for  28%  and  20%  of  the  variance, 
respectively.  Halstead's  E was  not  significantly  related  to 
performance.  Investigation  of  the  scatterplot  for  the 
Halstead  result  (Figure  3),  however,  produced  some 
interesting  observations.  There  were  three  data  points 
(circled  in  Figure  3)  which  were  developed  by  averaging 
across  three  participants  who  consistently  outscored  other 
participants  on  both  the  pretest  and  the  experimental 
programs.  Recognizing  the  effect  of  individual  differences 
on  performance,  it  was  likely  that  the  three  data  points 
generated  by  this  group  of  participants  resulted  more  from 
the  failure  of  random  assignment  to  fully  neutralize  the 
effect  of  individual  differences,  than  from  the  experimental 
conditions.  When  these  three  data  points  were  removed  the 
variance  accounted  for  by  all  three  metrics  increased. 
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TABLE  5 


Percent  of  Variance  in  Performance  Accoi^ted 
for  by  Software  Complexity  Metric 


METRIC 

PERCENT  OF 

ALL  GROUPS 
(n=27) 

VARIANCE 

EXCEPTIONAL 

GROUP  REMOVED 
(nf24) 

RAW  SCORES 

LENGTH 

.28** 

.37*** 

MCCABE'S  yCG) 

.20** 

.26** 

HALSTEAD'S  E 

.02 

.13* 

TRANSFORMED  SCORES 

LENGTH 

.14* 

.42*** 

MCCABE'S  V(G) 

.31*** 

.48*** 

HALSTEAD'S  E 

.04 

.53*** 

*£  ± .05 
**£  ± .01 
***£  i,  .001 
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Kclationship  between  — 
pcrfomance  and  E 
within  each  separate 
program 

Relationship  eliminated 
when  3 exceptional  data 
Doints  •..e-e  withdrawn 


FIGURE  3:  PERCENT  OF  STATEMENTS  CORRECTLY 
RECALLED  VERSUS  HALSTEAD'S  E 

Note;  Numbers  in  the  figure  represent  the  specific  program 
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With  the  three  points  for  the  exceptional  group 
removed,  the  relationships  between  performance  and  the 
complexity  metrics  were  generally  linear  within  the  data  for 
each  specific  program  (observe  the  solid  lines  in  Figure  3) . 
It  was  apparent  from  Figure  3,  and  was  confirmed  in  the 
regression  analyses,  that  there  were  considerable 
differences  among  programs  in  difficulty  and  complexity.  In 
order  to  determine  whether  the  complexity  metrics  were  more 
predictive  of  performance  within  programs  than  across  them, 
a transformation  was  applied  to  the  data.  The  lowest  value 
obtained  for  each  metric  among  the  three  versions  of  each 
specific  program  was  set  to  zero.  Similarly,  the  percent  of 
statements  correctly  recalled  for  the  version  with  the 
lowest  value  of  the  particular  metric  was  also  set  to  zero, 
allowing  the  percent  recalled  to  fall  below  zero  in  several 
instances  (a  difference  score) . This  transformation  of 
scores  represented  an  attempt  to  determine  whether 
performance  diminished  as  a function  of  increasing 
complexity  when  initial  differences  among  programs  had  been 
removed.’ 

The  percents  of  variance  in  performance  accounted  for 
by  each  of  the  transformed  metrics  are  presented  in  Table  5 
for  samples  with  and  without  the  exceptional  group  of 
participants  removed.  The  effect  of  the  transformation  on 
the  results  for  length  was  not  great.  However,  substantial 
increases  were  observed  for  both  McCabe's  V(G)  and 
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Halstead's  E.  Thus,  while  Halstead's  E accounted  for  only 
2%  of  the  variance  in  performance  among  the  raw  scores,  it 
accounted  for  53%  after  some  corrections  were  made  for 
differences  among  programs  and  participants. 
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4.0  DISCUSSION 

Three  factors  were  found  to  influence  programmers' 
ability  to  correctly  recall  programs  they  had  previously 
studied.  These  factors  included  individual  differences 

among  participants,  characteristics  of  specific  programs, 
and  the  level  of  program  structure.  Each  of  these  factors 
contributed  separately  to  the  prediction  of  performance  on 
the  task  studied  here.  In  addition,  several  metrics  of 

software  complexity  were  found  to  predict  understanding 
under  certain  conditions. 

Individual  differences  among  programmers  represent  an 
important  topic  which  has  been  mentioned  more  frequently 
than  studied.  Such  differences  (as  measured  by  a pretest) 
accounted  for  almost  one-fifth  of  the  variance  in 
performance.  The  effect  of  individual  differences  might  be 
even  greater  were  the  sample  expanded  beyond  the 

experienced,  professional  programmers  studied  here.  The 

size  of  the  effect  for  these  differences  suggests  that  an 
important  area  for  future  research  will  be  in  identifying 
strategies  for  the  selection,  placement  and  trainingjrof 
computer  programmers. 

Performance  effects  due  to  differences  among  programs 
are  not  easily  explained  from  these  results.  While  there 
was  a significant  effect  due  to  class  of  program 

(engineering  vs.  non-numeric  vs.  statistical),  this  result 
may  have  been  a function  of  some  experiencial  or  familarity 
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factor  particular  to  the  sample  studied.  Further,  effects 
due  to  differences  among  specific  programs  were  confounded 
in  this  experimental  design  with  effects  related  to 
individual  differences  among  participants.  Nevertheless, 
the  suggestion  of  substantial  differences  in  the  ease  of 
understanding  based  on  type  of  program  is  interesting  in 
light  of  the  limited  range  of  programs  employed  here. 

In  order  to  properly  analyze  subtle  participant  X 
program  interactions,  more  participants  would  be  required  in 
each  cell  of  this  experimental  design.  Such  effects  may 
involve  interactions  between  the  nature  or  purpose  of  the 
program  and  some  experiential  or  cognitive  factor  among 
programmers.  Such  interactions  may  hold  important 
implications  for  the  management  of  software  development 
projects,  especially  regarding  the  assignment  of 
programmers . 

The  characteristics  of  programs  studied  in  this 
experiment  included  two  principles  of  programming  style  and 
several  metrics  of  software  complexity.  Of  the  programming 
style  variables  examined,  only  level  of  structure  had  an 
effect  on  performance.  Results  confirmed  that  well 
structured  programs  were  easier  to  understand  than 
unstructured  ones.  Yet,  adhering  strictly  to  the  rules  of 
structured  programming  in  FORTRAN  often  caused  clumsy 
constructions  that  were  no  more  effective  than  taking 
certain  liberties  with  the  structure  to  create  cleaner 
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code. These  liberties  included  multiple  returns  and 
judiciously  placed  backward  GO  TO  statements;  both  are 
violations  of  the  rules  for  structured  coding. 

No  differences  in  performance  were  observed  as  a 
function  of  the  mnemonic  value  of  variable  names.  However, 
many  participants  evidenced  a preference  for  mnemonic  names, 
in  that  they  used  their  own,  more  meaningful  names  when 
rewriting  the  least  mnemonic  versions  of  the  programs.  For 
the  medium  and  most  mnemonic  versions  they  tended  to  use  the 
names  supplied  in  the  original  programs.  Thus,  the 
importance  of  mnemonic  variable  names  is  supported  by  the 
anecdotal  rather  than  statistical  evidence. 

In  addition  to  the  experimental  evidence  that  the  level 
of  program  structure  affects  understanding,  there  was 
correlational  evidence  that  understanding  was  related  to  the 
psychological  complexity  of  the  program.  The  best  predictor 
of  performance  in  the  raw  data  among  the  three  complexity 
metrics  studied  was  the  total  number  of  statements  in  the 
program.  However,  the  considerable  amount  of  variance  in 
this  study  related  to  differences  among  individuals  and 
programs  may  have  masked  relationships  between  performance 
and  complexity  metrics,  especially  Halstead's  E.  When  the 
data  were  transformed  in  an  attempt  to  remove  some  of  the 
effects  due  to  differences  among  programs  and  exceptional 
programmers,  the  Halstead  and  McCabe  metrics  were  found  to 
be  substantially  better  predictors  of  performance  than  they 
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had  been  in  the  untransforroed  data.  This  result  is 
consistent  with  the  finding  in  a pilot  study  (Sheppard  & 

Love,  1977)  that  Halstead's  E was  highly  correlated  with 

i 

performance.  j 

In  essence,  the  relationships  of  the  Halstead  and  | 

; 

McCabe  metrics  with  performance  were  approximately  linear  j 

within  different  versions  of  a single  program.  This  j 

observation  implies  that  prediction  of  the  number  of  bugs  in 
a orogram  by  the  Halstead  and  McCabe  metrics  alone  may  not 
prove  adequate.  Prediction  may  be  substantially  improved  by  [ 

including  some  variable (s)  relating  to  differences  among  | 

i, 

programs  or  programmer  X program  interactions.  This  issue  | 

I 

will  be  addressed  by  future  studies  in  this  research  I 

program. 

An  interesting  problem  in  the  Halstead  metric  was 
observed  in  a program  which  generated  a Halstead  value  of 
250  and  a McCabe  value  j’’  only  6.  Inspection  of  the  program 
showed  a series  of  assignment  statements  of  the  form: 

IF  (AMT2.lt. AMT3)  MAXPHI  * SORT ( (AMT 2/ AMT 1)  + (AMT4/AMT3) ) 

Although  such  statements  resulted  in  a high  E value, they  did 
not  affect  the  control  flow.  Relationships  between 
performance  and  different  ways  of  computing  complexity 
metrics  will  be  addressed  as  data  are  collected  in 
additional  experiments  from  this  research  program.  These 
additional  data  will  provide  a more  comprehensive  base  from 
which  to  test  hypotheses  regarding  these  metrics. 


u 
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Finally,  it  should  be  noted  that  smaller,  single  factor 
experiments  would  have  precluded  analysis  of  a number  of  the 
effects  reported  in  this  study.  In  particular,  implications 
that  individual  and/or  program  factors  may  need  to  be 
included  in  predictions  of  psychological  complexity  may  not 
have  emerged  had  this  study  included  only  three  rather  than 
nine  separate  computer  programs  (cf.  Sheppard  & Love,  1977). 
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INSTRUCTION?  TO  PARTICIPANTS 


GOOD  MORNING! ! 


Today  we  are  going  to  ask  you  to  participate  in  an  experiment  which 
we  hope  will  be  both  entertaining  and  challenging. 

This  work,  sponsored  by  the  Office  of  Naval  Research,  is  being  done 
to  make  computer  programs  more  readable.  To  do  this,  we  need  to  measure  your 
understanding  of  a particular  program.  We  will  give  you  three  separate  programs 
and  ask  you  to  study  each  program  carefully,  then  reconstruct  that  program  from 
memory  without  any  notes. 

In  previous  research,  we  have  discovered  that  recoding  a program  from 
memory  is  a very  sensitive  measure  of  understanding  of  a program. 

Our  purpose  is  to  evaluate  different  ways  of  writing  a computer  program. 
It  is  not  to  evaluate  computer  progranmers.  Your  performance  on  a program  will 
be  compared  only  to  your  performance  on  other  programs.  Your  only  competition 
is  yourself.  All  programs  and  papers  that  you  will  be  handed  ar-e  carefully 
numbered  so  it  is  not  necessary  for  you  put  your  name  on  any  of  these. 


purposes : 


Me  would  like  you  to  answer  the  following  questions  for  our  research 

1.  How  long  have  you  been  programming  in  FORTRAN  professionally? 
years  months 


2.  Please  circle  one  of  the  following:  Has  yourprimary  experience 
been  in  Engineering,  Statistical  or  Non-numeric  programs? 

During  this  experiment,  each  of  you  will  be  working  on  different 
programs.  If  someone  else  seems  to  finish  earlier  than  you,  don't  be  concerned. 

They  will  have  been  working  on  something  else  entirely  which  might  not  require  as 
much  time. 

We  will  begin  this  morning  with  a simple  test  program.  We  will  ask  you 
to  study  this  FORTRAN  program  for  ten  minutes.  During  this  time,  you  may  write 
anything,  draw  a flow  chart,  or  make  any  notes  to  help  you  understand  and  memorize 
the  program.  When  the  10  minutes  are  up,  you  will  be  asked  to  hand  in  the  programs 
and  any  notes.  We  will  then  give  you  5 minutes  to  rewrite  the  program  f om  memory 
as  best  you  can.  Since  we  are  interested  in  your  understanding  of  the  program,  it 
it  not  necessary  for  you  to  memorize  statements,  statement  numbers,  or  variable  names 
exactly.  It's  O.K.  as  long  as  the  program  still  does  tne  same  job. 

If  there  are  any  questions,  please  ask  them  at  this  time. 
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App.mdix  6.2.1  - A Descriotion  of  thp  Statlsticil  Programs 


Appendix  6.2.Z  - A p-.^crlptlpn  gf  the  Ep«>neerinq  programs 


Appendix  6,2.3  - A Description  of  the  Non-Mutneric  Proqrams 
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APPENDIX  6,3,3 

SELECT,  CONTROL  FL05f  LEVEL  3,  MNEMONIC  LEVEL  1 


SCBBODTITO  SELECT(J,B,lI,n,lC,P,Jl,Rl) 

IRTECER  B(J).  0,K.D(26)  ,a.i*,Y,II(26)  ,L 
EXTERRAL  L 

DATA  R/IHA. IHB, IHC, IHD, IBE, IHF. IHC, IBH. IHI. IBJ. 

1 IHK,  IHL,  IHH,  IHH,  IHO,  IHP,  IHQ.  IHR.  IBS.  IBT.  iHD.  IHV.  IRV, 

2 IHX.  IHY,  IHZ^ 

IF(J.LE.2S)  GO  TO  9B 

p.99 

GO  TO  see 

9e  pae 

DO  lee  i>i,26 

0( I)aR( I) 

lee  CORTIRDE 

DO  120  I«l,25 
Q«U  I.26.J1.H1) 

Y>D(Q) 

D((U«0(  J) 

D<  1)»Y 

12e  CORTIROE 

DO  14e  I-l.J 
B( I)«D(  1) 

14e  CORTIRDE 

IFCRARCJl.Hl) .CT.e.S)  GO  TO  200 
M « 1 

K>L(  l.J.Jl.Hl) 

D«B(K) 

GO  TO  see 

2ee  Ki«j-»-i 

K«L(C1.26.J1.H1) 

D«D(K> 

H « 2 
K«e 

see  RETDRR 

ERD 

IRTBCER  FURCnOR  Ull,  Rl,  Jl.Kl) 

IRTOGER  II,  Rl.  Jl,  HI 
IF  (Rl.Ce.  ID  GO  TO  10 
lY  « HI 
Rl  • II 
II  ■ lY 

10  R » Rl  - II  ♦ I 

G ■ RAR  (Jl.  Hl> 

L « IFIX  (C  » R ♦ FIjOAT  (ID) 

ERD 
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APPENDIX  6.3.1 

INTEG,  CONTROL  FLOW  LEVEL  1,  MNEMONIC  LFVEL  3 


SUBROtnrinE  aATR(L80TnfD,TO0TJin),ABSERR,ia>IH,nrCT,aESTJLT,ERB0R.ABRAY> 
OIHEKSION  ABBAYCKDIIO 
IRTECER  ERROR 
REAL  LBOORO.  IRCRE 

ARRAYC  1 ) « . 5*  < FRCTC  LBOURD) +rWCT<  UBOURD) ) 

WIDTH*  OBOimO-LBOOm) 

IF(in>lH-l)9e,90,  10 

IF(  WIDTH) 20, 110,20 

HAFWID-WIDTH 

ERRHED*  ABSERR/' ABSf  WIDTH) 

con 1*0. 

VAL1»1. 

IHDEX3* 1 
DO  80  I>2,nDIII 
RESULT* ARRAY!  1) 

C0N2«C0N1 
IHCRE*HAFWID 
HAFWID* . S«HAFWID 
VAL1>.S»VAL1 
ARCUH*  LBOimiHHAFWID 
DLTE>0. 

DO  30  J>1 , INDES3 
DLTX*DLTX+FHCT(  ARCDIO 
ARGtm*  ARCXIH4- 1 RCRE 

conrimjE 

ARRAY!  1)».5»ARRAY!  I-1)+VAL1»DLTE 

XTRAPL*!. 

inDEXl»I-l 

DO  40  J* 1 . IRDEXl 

IHDEIQ*  I- J 

XrRAPL»4.  * XTRAPL  

ARRAY!  IRDEffi) « ARRAY!  inDEX2+ 1)  +<  ARRAY!  IHDE3B+ 1 ) -ARRAY!  IHDEIB)  ) / 
IXTRAPL-l.) 

COHTinUE 

con 1 ■ ABS! RESULT- ARRAY!  1 ) ) 

IF! 1-3)70,50.50 
IF!COni-ERRMED) 110,110,60 
lF!C0ni-C0n2)70. 120, 120 
inDEIC3>2«inDEIC3 

conrinuE 
ERROR* 2 

RESULT* W1DTH*ARRAY!  1) 

RETURn 
ERROR* 0 
GO  TO  100 
ERROR* 1 

RESULT«W1DTH«RESULT 

RETURn 

EnD 

FUnCTIOn  FnCT!ABGOIO 
rnCT*  1 . .'1 2 . +ARGUM) 

RETURH 

Eno 


APPENDIX  6.3,2 

CHISQ,  CONTROL  FLOW  LEVEL  2,  MNEMONIC  LEVEL  2 


STOROUTIHE  CHISa(IUT,lf,tt,CS,DEC,ElUl.KrOT,CTOT) 
inTEGEH  EIUl.DEC,PTR 
R£AL  HAT 

DIHEKSIOH  MAT(  IM)  .HTOTClf)  ,CTOT(H) 

EBR*» 

cs>e.e 

DEG><II-1)«(H-1) 

IF  (DEG  .GT.  e)  GO  TO  le 

EBR»2 

RETDRlf 

DO  29  i>i,n 

RTOT( 1)»9.9 

PTRa I-H 

DO  29  J»1,H 

PTR*  PTR+W 

RTOT(  r)*RTOT(  I)+HAT(PTIO 

COHTIirDE 

PTR»9 

DO  39  J«1,H 
CT0T(J)«9.9 
DO  39  lal.n 

PTRaPTR+I  . - - 

CTOT(  J)  »CTOT(  J)  +MAT(  PTR) 

COimiTOE 
0T0T»9.9 
DO  49  I«1,1T 
GTOT«GTOT+RTOT( I) 
coirriiroE 

IF  (IfH  .EQ.  4)  GO  TO  69 
PTR»9 

DO  59  J>1,H 
DO  99  I-l.IT 
Pra*PTR+l 

EXPT»RTOT(  I)»CTOT(J)/'CTOT 
IF  (EXPT  .LT.  1.9)  ERR>1 

CS=CS+(  MAT(  PTR) -EXPT) *(  MAT( PTR) -EXPT)  /TXPT 
COHTIlfTJE  / 

RETDRlf 

CSaGTOT»(  ABS;  MATf  1 ) »HAT(  4) -MAT(2) *HAT( 3)  ) -(TTOT/R. 9) 

/(  (rrOT(  1 ) *CTOT(  2)  »RTOT(  1 ) SRTDTC  2)  ) 

RETDRlf 

ElfD 
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MEASURING  COMPLEXITY  OF  CONTROL  FLOW 
In  developing  a metric  for  software  complexity,  one 
approach  might  consider  the  number  of  statements  in  a 
program,  thus  equating  length  and  complexity.  A slightly 
more  sophisticated  measure  is  the  percent  of  statements  that 
affect  control  flow.  A Bell  Telephone  Laboratories  study 
(Davis,  Dickman,  Kouni,  & Amster,  1976)  used  this  metric  on 
a large  number  of  programs.  It  has  a problem  because 
complexity  can  be  held  constant  as  the  size  of  the  program 
increases. 

To  assign  a metric  to  control  flow  complexity,  we  must 
examine  the  elementary  control  structures  of  a program. 
This  requires  breaking  the  program  down  into  elementary 
building  blocks,  assessing  the  complexity  of  each  block,  and 
then  combining  these  assessments  into  higher  level 
components. 

Halstead  (1975)  accomplished  this  decomposition  and 
synthesis  by  choosing  operands  and  operators  as  the  smallest 
conceptual  units  to  develop  E,  his  measure  of  the  complexity 
of  a program. 

At  a more  abstract  level,  we  can  define  statements  and 
groups  of  statements  which  represent  cognitive  blocks  (or 
chunks)  to  a programmer  (e.g.,  DO,  GO  TO).  These  blocks  are 
probably  more  representative  of  the  way  people  manipulate 
concepts  than  the  smaller  units  Halstead  uses.  Ramsey 
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(1977)  is  currently  involved  in  experimental  studies  to  test 
this  assertion. 


j Another  attempt  to  work  at  this  more  abstract  level  is 

shown  by  McCabe  (1976)  , who  has  defined  complexity  in 
I relation  to  the  decision  structure  of  the  program.  He 

ignores  the  data  structure  totally.  His  complexity  metric, 

i V(G) , is  the  classical  graph-theory  cyclomatic  number 

I defined  as:  # edges  - # nodes  + # connected  regions.  Simply 

stated,  he  counts  the  number  of  basic  control  paths  through 

1 

I a computer  program. 

^ The  simplest  program  possible  would  have  V(G)  * 1. 

' Sequences  do  not  add  to  the  complexity.  IF-THEN-ELSE  is 

I valued  as  2,  increasing  the  complexity  by  1,  a DO  or  DO 

WHILE  is  also  2,  the  assumption  being  that  there  are  really 

I 

! only  two  control  paths,  the  straight  path  through  the  DO  and 

. the  return  to  the  top,  regardless  of  the  number  of  times 

executed.  Clearly  a DO  executed  25  times  is  not  25  times 
more  complex  than  a DO  executed  once. 

^ McCabe's  method  is  explained  only  for  structured 

programs.  In  order  to  compute  the  metric  for  unstructured 
programs,  several  alterations  were  made.  An  additional 
RETURN  was  counted  as  an  extra  path  in  each  case,  keeping 
the  cyclomatic  number  the  same  as  a "GO  TO  end"  would  have. 

For  statements  of  the  form:  IF(  ) 100,  200,  300,  the 
complexity  was  increased  by  2 as  opposed  to  the  logical  IF, 
which  increases  the  complexity  by  1.  These  are  small 
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changes  which  appear  to  be  reasonable  extensions  of  McCabe's 
theory.  However,  one  question  which  arises  is  the  case  of 
the  arithmetic  IF  where  two  paths  are  the  same: 

IP  ( > 100,  100,  200 

Should  this  add  1 or  2 to  the  complexity?  In  order  to 
standardize  the  procedure,  it  was  counted  as  the  standard 
arithmetic  IF  with  2 added  to  the  V(G)  metric. 

A limitation  of  McCabe's  measure  is  that  it  does  not 
deal  with  an  important  feature  that  may  affect  program 
complexity.  There  is  no  provision  for  considering  the  level 
of  nesting  in  various  constructions.  For  example,  the 
complexity  of  three  DO  loops  in  succession  would  be  rated 
exactly  the  same  as  three  DOs  that  are  nested.  Possibly  at 
some  later  time  it  will  be  decided  that  these  two  conditions 
have  the  same  complexity,  but  at  this  time  it  seems  rash  to 
prematurely  exclude  nesting  as  a major  contributor  of 
complexity.  Presently,  many  programming  shops  limit  the 
degree  to  which  nesting  is  allowed  because  managers  feel  it 
causes  problems. 

Sullivan  and  his  associates  (Bell  & Sullivan,  1974; 
Sullivan,  1973)  at  the  MITRE  Corporation,  have  incorporated 
the  effects  of  nesting  levels  into  a quantitative  measure  of 
complexity.  Like  McCabe,  Sullivan  works  with  a program  flow 
graph,  making  his  metric  independent  of  the  programming 
language  used.  Sullivan  breaks  the  code  into  units  such 
that  he  can  define  a "local  complexity"  at  any  point  as  the 
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number  of  "active  concepts"  one  must  consider  at  that  point 
in  the  program.  He  suggests  several  ways  of  combining  the 
local  complexities  into  an  overall  complexity  measure  for 
the  program.  For  example,  sum  the  local  complexities  or 
take  the  largest  local  complexity. 

The  problem  with  this  metric  is  that  it  is  complicated 
to  compute  and  has  been  implemented  only  in  JOVIAL  to  scan 
JOVIAL  code.  Hand  calculation  would  be  extremely  tedious 
and  probably  error-prone  for  any  non-trivial  program.  It  is 
not  even  clear  that  the  decompositions  are  unique  in  all 
cases.  Were  it  discovered  to  be  a good  predictor  of 
complexity,  it  would  still  take  machine  implementation  in 
several  languages  to  get  people  to  use  it.  A similar  metric 
which  can  be  easily  computed  by  machine  has  recently  been 
described  by  Richards  (1976) . 

Reiter  (1977)  has  developed  a new  metric,  designed  to 
eliminate  the  problems  discussed  above.  He  uses  the  same 
rating  scheme  for  the  three  basic  structures  described 
above,  but  in  addition,  assignment  statements  are  accounted 
for  in  the  complexity  metric.  He  represents  a program  as  a 
group  of  nested  boxes.  Complexity  is  evaluated  from  the 
innermost  part  of  the  nest  outward,  adding  a weighting 
factor  for  each  escape  to  a higher  level.  This  appears  to 
be  a reasonable  approach.  However,  the  metric  is  in  the 
development  stage  and  has  not  been  tested.  It  is  therefore 
difficult  to  decide  whether  the  allocation  of  values  to  the 
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APPENDIX  6.5.1 

MNEMONIC  VARIABLE  NAMES  FOR  THE  STATISTICAL  PROGRAMS 
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